AITopics

doi: 10.1145/3447548.3467136

2106.1062

Country:

Europe (0.93)
North America > United States > California > San Francisco County > San Francisco (0.30)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.28)

Genre: Research Report (0.70)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.53)

Shah, Mahak, Hazarika, Akaash Vishal, Malhotra, Meetu, Patil, Sachin C., Mohanty, Joshit

Bridging Emotions and Architecture: Sentiment Analysis in Modern Distributed Systems

arXiv.org Artificial IntelligenceMar-23-2025

Sentiment analysis is a field within NLP that has gained importance because it is applied in various areas such as; social media surveillance, customer feedback evaluation and market research. At the same time, distributed systems allow for effective processing of large amounts of data. Therefore, this paper examines how sentiment analysis converges with distributed systems by concentrating on different approaches, challenges and future investigations. Furthermore, we do an extensive experiment where we train sentiment analysis models using both single node configuration and distributed architecture to bring out the benefits and shortcomings of each method in terms of performance and accuracy.

machine learning, natural language, sentiment analysis, (17 more...)

2503.1826

Country:

North America > United States > Virginia > Norfolk City County > Norfolk (0.04)
North America > United States > North Carolina > Wake County > Raleigh (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > New Finding (0.69)

Industry: Information Technology (0.31)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
(2 more...)

Nguyen, Hai-Yen Phan, Ly, Phi-Lan, Le, Duc-Manh, Do, Trong-Hop

Real-time stress detection on social network posts using big data technology

arXiv.org Artificial IntelligenceNov-7-2024

In the context of modern life, particularly in Industry 4.0 within the online space, emotions and moods are frequently conveyed through social media posts. The trend of sharing stories, thoughts, and feelings on these platforms generates a vast and promising data source for Big Data. This creates both a challenge and an opportunity for research in applying technology to develop more automated and accurate methods for detecting stress in social media users. In this study, we developed a real-time system for stress detection in online posts, using the "Dreaddit: A Reddit Dataset for Stress Analysis in Social Media," which comprises 187,444 posts across five different Reddit domains. Each domain contains texts with both stressful and non-stressful content, showcasing various expressions of stress. A labeled dataset of 3,553 lines was created for training. Apache Kafka, PySpark, and AirFlow were utilized to build and deploy the model. Logistic Regression yielded the best results for new streaming data, achieving 69,39% for measuring accuracy and 68,97 for measuring F1-scores.

dataset, detection, real time, (15 more...)

2411.04532

Country:

Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology (0.83)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Heydari, Mohammad, Sarshar, Reza, Soltanshahi, Mohammad Ali

Distributed Record Linkage in Healthcare Data with Apache Spark

arXiv.org Artificial IntelligenceMar-9-2024

Healthcare data is a valuable resource for research, analysis, and decision-making in the medical field. However, healthcare data is often fragmented and distributed across various sources, making it challenging to combine and analyze effectively. Record linkage, also known as data matching, is a crucial step in integrating and cleaning healthcare data to ensure data quality and accuracy. Apache Spark, a powerful open-source distributed big data processing framework, provides a robust platform for performing record linkage tasks with the aid of its machine learning library. In this study, we developed a new distributed data-matching model based on the Apache Spark Machine Learning library. To ensure the correct functioning of our model, the validation phase has been performed on the training data. The main challenge is data imbalance because a large amount of data is labeled false, and a small number of records are labeled true. By utilizing SVM and Regression algorithms, our results demonstrate that research data was neither over-fitted nor under-fitted, and this shows that our distributed model works well on the data.

algorithm, dataset, record linkage, (10 more...)

2404.07939

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.05)
Europe > Switzerland > Basel-City > Basel (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.36)

#artificialintelligenceFeb-16-2023, 09:00:43 GMT

[100%OFF] Machine Learning with Apache Spark 3.0 using Scala

Fundamental knowledge on Machine Learning with Apache Spark using Scala Learn and master the art of Machine Learning through hands-on projects, and then execute them up to run on Databricks cloud computing services You will Build Apache Spark Machine Learning Projects (Total 4 Projects) Explore Apache Spark and Machine Learning on the Databricks platform. Can I get a certificate after completing the course? Are there any other coupons available for this course? Note: 100% OFF Udemy coupon codes are valid for maximum 3 days only. Look for "ENROLL NOW" button at the end of the post.

developer, learning, machine learning, (11 more...)

Genre: Instructional Material > Course Syllabus & Notes (0.78)

Industry: Education (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

#artificialintelligenceJan-5-2023, 19:05:21 GMT

First Steps in Machine Learning with Apache Spark

Apache Spark is one of the main tools for data processing and analysis in the BigData context. It's a very complete (and complex) data processing framework, with functionalities that can be roughly divided into four groups: SparkSQL & DataFrames, the all-purpose data processing needs; Spark Structured Streaming, used to handle data-streams; Spark MLlib, for machine learning and data science and GraphX, the graph processing API. I've already featured the first two in other posts: creating an ETL process for a Data Warehouse and integrating Spark and Kafka for stream processing. Today is the time for the third one -- Let's play with Machine Learning using Spark MLlib. Machine Learning has a special place in my heart, because it was my entrance door to the data science field and, as probably many of yours, I started it with the classic Scikit-Learn library.

apache spark, artificial intelligence, machine learning, (14 more...)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.56)

#artificialintelligenceDec-25-2022, 10:41:01 GMT

Data Engineering and Machine Learning using Spark

Organizations need skilled, forward-thinking Big Data practitioners who can apply their business and technical skills to unstructured data such as tweets, posts, pictures, audio files, videos, sensor data, and satellite imagery and more to identify behaviors and preferences of prospects, clients, competitors, and others. In this short course you'll gain practical skills when you learn how to work with Apache Spark for Data Engineering and Machine Learning (ML) applications. You will work hands-on with Spark MLlib, Spark Structured Streaming, and more to perform extract, transform and load (ETL) tasks as well as Regression, Classification, and Clustering. The course culminates in a project where you will apply your Spark skills to an ETL for ML workflow use-case. NOTE: This course requires that you have foundational skills for working with Apache Spark and Jupyter Notebooks.

apache spark, data engineering and machine learning

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.40)
Education > Educational Setting > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Integration (0.91)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.91)

#artificialintelligenceDec-16-2022, 02:25:25 GMT

How to Install Spark NLP. A step-by-step tutorial on how to make…

Apache Spark is an open-source framework for fast and general-purpose data processing. It provides a unified engine that can run complex analytics, including Machine Learning, in a fast and distributed way. Spark NLP is an Apache Spark module that provides advanced Natural Language Processing (NLP) capabilities to Spark applications. It can be used to build complex text processing pipelines, including tokenization, sentence splitting, part of speech tagging, parsing, and named entity recognition. Although the documentation, which describes how to install Spark NLP is quite clear, sometimes you can get stuck, while installing it.

artificial intelligence, natural language, text processing, (15 more...)

Genre: Instructional Material (0.62)

Industry: Information Technology > Software (0.37)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

#artificialintelligenceDec-2-2022, 03:30:57 GMT

AWS re:Invent 2022: Data and Machine Learning

On the second day of Amazon Web Services (AWS) re:Invent, Swami Sivasubramanian, vice president of AWS Data and Machine Learning (ML) revealed the latest innovations during his keynote. To start, Sivasubramanian announced the launch of Amazon Athena for Apache Spark, which he said will provide organizations with a more intuitive way to run complex data analytics. He noted that Apache Spark will run three times faster on AWS. The next product announcement was of the general availability of Amazon DocumentDB Elastic Clusters, a fully-managed solution to quickly scale document workloads of any size. Amazon SageMaker now supports Geospatial ML, giving access to multiple new kinds of data.

data and machine learning, datazone, sivasubramanian, (9 more...)

Industry:

Information Technology (0.37)
Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.65)

#artificialintelligenceNov-5-2022, 19:27:03 GMT

Automating Digital Pathology with Machine Learning

With technological advancements in imaging and the availability of new efficient computational tools, digital pathology has taken center stage in both research and diagnostic settings. Whole Slide Imaging (WSI) has been at the center of this transformation, enabling us to rapidly digitize pathology slides into high resolution images. By making slides instantly shareable and analyzable, WSI has already improved reproducibility and enabled enhanced education and remote pathology services. Today, digitization of entire slides at very high resolution can occur inexpensively in less than a minute. As a result, more and more healthcare and life sciences organizations have acquired massive catalogues of digitized slides.

annotation, classifier, workflow, (13 more...)

Genre: Workflow (0.76)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)